-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54938][PYTHON][TESTS] Add tests for pa.array type inference #53718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-54938][PYTHON][TESTS] Add tests for pa.array type inference #53718
Conversation
JIRA Issue Information=== Sub-task SPARK-54938 === This comment was automatically generated by GitHub Actions |
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
zhengruifeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks so much, it is much cleaner.
inspiried by https://github.com/apache/spark/pull/53727/changes,
I think we need to also test following cases:
1, string: non-english values;
2, integral: min max values, and make it overflow
3, floats: nan, -inf, inf
4, time: Unix epoch, min max values
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
python/pyspark/tests/upstream/pyarrow/test_pyarrow_type_inference.py
Outdated
Show resolved
Hide resolved
efbc505 to
905b616
Compare
905b616 to
f12b2a0
Compare
dev/sparktestsupport/modules.py
Outdated
| # unittests for upstream projects | ||
| "pyspark.tests.upstream.pyarrow.test_pyarrow_ignore_timezone", | ||
| "pyspark.tests.upstream.pyarrow.test_pyarrow_scalar_type_inference", | ||
| "pyspark.tests.upstream.pyarrow.test_pyarrow_type_inference", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename the file test_pyarrow_array_type_inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed.
zhengruifeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise, LGTM
|
thanks, merged to master |
What changes were proposed in this pull request?
Add tests for PyArrow's
pa.arraytype inference behavior. These tests monitor upstream PyArrow behavior to ensure PySpark's assumptions remain valid across versions.The tests cover type inference across input categories:
Nonevalueslist,tuple,dict(struct)Types tested include:
Pandas extension types tested:
pd.Int8Dtype()...pd.Int64Dtype(),pd.UInt8Dtype()...pd.UInt64Dtype(),pd.Float32Dtype(),pd.Float64Dtype(),pd.BooleanDtype(),pd.StringDtype()pd.ArrowDtype(pa.int64()),pd.ArrowDtype(pa.float64()),pd.ArrowDtype(pa.large_string()), etc.Why are the changes needed?
This is part of SPARK-54936 to monitor behavior changes from upstream dependencies. By testing PyArrow's type inference behavior, we can detect breaking changes when upgrading PyArrow versions.
Does this PR introduce any user-facing change?
No. This PR only adds tests.
How was this patch tested?
New unit tests added:
Was this patch authored or co-authored using generative AI tooling?
No.